A Provably Correct Learning Algorithm for Latent-Variable PCFGs
نویسندگان
چکیده
We introduce a provably correct learning algorithm for latent-variable PCFGs. The algorithm relies on two steps: first, the use of a matrix-decomposition algorithm applied to a co-occurrence matrix estimated from the parse trees in a training sample; second, the use of EM applied to a convex objective derived from the training samples in combination with the output from the matrix decomposition. Experiments on parsing and a language modeling problem show that the algorithm is efficient and effective in practice.
منابع مشابه
Spectral learning of latent-variable PCFGs: algorithms and sample complexity
We introduce a spectral learning algorithm for latent-variable PCFGs (Matsuzaki et al., 2005; Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides statistically consistent parameter estimates. Our result rests on three theorems: the first gives a tensor form of the inside-outside algorithm for PCFGs; the second shows that the required tensors ...
متن کاملExperiments with Spectral Learning of Latent-Variable PCFGs
Latent-variable PCFGs (L-PCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of L-PCFGs, which—unlike the EM algorithm—is guaranteed to give consistent parameter estimates (it has PAC-style guarantees of sample complexity). This paper describes experiments using the spectral algorithm. W...
متن کاملDiversity in Spectral Learning for Natural Language Parsing
We describe an approach to create a diverse set of predictions with spectral learning of latent-variable PCFGs (L-PCFGs). Our approach works by creating multiple spectral models where noise is added to the underlying features in the training set before the estimation of each model. We describe three ways to decode with multiple models. In addition, we describe a simple variant of the spectral a...
متن کاملTensor Decomposition for Fast Parsing with Latent-Variable PCFGs
We describe an approach to speed-up inference with latent-variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for t...
متن کاملSpectral Learning of Latent-Variable PCFGs
Jeju, Republic of Korea, 8-14 July 2012. c ©2012 Association for Computational Linguistics Spectral Learning of Latent-Variable PCFGs Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, and Lyle Ungar Dept. of Computer Science, Columbia University Dept. of Statistics/Dept. of Computer and Information Science, University of Pennsylvania {scohen,stratos,mcollins}@cs.columbia.edu, foster...
متن کامل